Minimum Message Length based Mixture Modelling using Bivariate von Mises Distributions with Applications to Bioinformatics
نویسنده
چکیده
The modelling of empirically observed data is commonly done using mixtures of probability distributions. In order to model angular data, directional probability distributions such as the bivariate von Mises (BVM) is typically used. The critical task involved in mixture modelling is to determine the optimal number of component probability distributions. We employ the Bayesian information-theoretic principle of minimum message length (MML) to distingush mixture models by balancing the trade-off between the model’s complexity and its goodness-of-fit to the data. We consider the problem of modelling angular data resulting from the spatial arrangement of protein structures using BVM distributions. The main contributions of the paper include the development of the mixture modelling apparatus along with the MML estimation of the parameters of the BVM distribution. We demonstrate that statistical inference using the MML framework supersedes the traditional methods and offers a mechanism to objectively determine models that are of practical significance.
منابع مشابه
Unsupervised Learning of Gamma Mixture Models Using Minimum Message Length
Mixture modelling or unsupervised classification is a problem of identifying and modelling components in a body of data. Earlier work in mixture modelling using Minimum Message Length (MML) includes the multinomial and Gaussian distributions (Wallace and Boulton, 1968), the von Mises circular and Poisson distributions (Wallace and Dowe, 1994, 2000) and the distribution (Agusta and Dowe, 2002a, ...
متن کاملModelling of directional data using Kent distributions
The modelling of data on a spherical surface requires the consideration of directional probability distributions. To model asymmetrically distributed data on a three-dimensional sphere, Kent distributions are often used. The moment estimates of the parameters are typically used in modelling tasks involving Kent distributions. However, these lack a rigorous statistical treatment. The focus of th...
متن کاملMML mixture modelling of multi - state , Poisson , von Mises circular and Gaussian distributionsChris
Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also consistent and eecient. We provide a brief overview of MML inductive inference (Wallace and Boulton (1968), Wallace and Freeman (1987)), and how it has both an information-theoretic and a Bayesian interpretation. We then outline how MML is used for statistical parameter estimation, and how the MML mix...
متن کاملEfficiency of the pseudolikelihood for multivariate normal and von Mises distributions
In certain circumstances inference based on the likelihood function can be hindered by, for example, computational complexity; new applications of directional statistics to bioinformatics problems give many obvious examples. In such cases it is necessary to seek an alternative method of estimation. Two pseudolikelihoods, each based on conditional distributions, are assessed in terms of their ef...
متن کاملSome Fundamental Properties of a Multivariate von Mises Distribution
In application areas like bioinformatics multivariate distributions on angles are encountered which show significant clustering. One approach to statistical modelling of such situations is to use mixtures of unimodal distributions. In the literature (Mardia et al., 2011), the multivariate von Mises distribution, also known as the multivariate sine distribution, has been suggested for components...
متن کامل